The Author-Topic Model and the author prediction
نویسنده
چکیده
The author-topic model is a generative model for documents that extends Latent Dirichilet Allocation to include authorship information, which is proposed by Michal Rosen-Zvi et al. The model connects each author to a multinomial distribution over topics and associated each topic with a words’ multinomial distribution. A document with multiple authors is modeled as a distribution over topics that are a mixture of the distributions associated with the authors. In this project, I re-implement the model to a collection of about 250 NIPS conference papers (be chosen randomly from a collection of about 1700 NIPS papers). Exact inference is intractable for these datasets and I use Gibbs sampling to estimate the topic and author distributions. The tagging results with different topic numbers are given. After getting the distribution values, I present a new method that apply maximum likelihood estimate to do author prediction on about other 100 papers of which the authors are in the same set as the training papers. The precision of prediction is given. Key word: author-topic model; Gibbs sampling; multinomial distribution; tagging; author prediction;
منابع مشابه
A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure
Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...
متن کاملJoint Author Sentiment Topic Model
Traditional works in sentiment analysis and aspect rating prediction do not take author preferences and writing style into account during rating prediction of reviews. In this work, we introduce Joint Author Sentiment Topic Model (JAST), a generative process of writing a review by an author. Authors have different topic preferences, ‘emotional’ attachment to topics, writing style based on the d...
متن کاملThe Crisis of Representation in Azadeh Khanoom and Her Author by Reza Baraheni
The crisis of representation is a topic widely discussed in critique and theory of postmodern literature. This refers to the crises of the present era including the crisis of meaning, the perplexity of contemporary humankind amidst a mass of valid and invalid data, alienation, etc. Literature, as the epitome of human life, is a reflection of these crises in the contemporary era. Azadeh Khanoom ...
متن کاملApplication of Artificial Neural Networks and Support Vector Machines for carbonate pores size estimation from 3D seismic data
This paper proposes a method for the prediction of pore size values in hydrocarbon reservoirs using 3D seismic data. To this end, an actual carbonate oil field in the south-western part ofIranwas selected. Taking real geological conditions into account, different models of reservoir were constructed for a range of viable pore size values. Seismic surveying was performed next on these models. F...
متن کاملA Heuristic Model for Predicting Bankruptcy
Bankruptcy prediction is one of the major business classification problems. The main purpose of this study is to investigate Kohonen self-organizing feature map in term of performance accuracy in the area of bankruptcy prediction. A sample of 108 firms listed in Tehran Stock Exchange is used for the study. Our results confirm that Kohonen network is a robust model for predicting bankruptcy in ...
متن کامل